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ABSTRACT 


Image processing is mostly used for exploring image behaviour. 
There are several steps in image processing. Image acquisition, 
pre-processing, feature extraction, and classification are the processes used 
for the detection of human movement based on high-level feature extraction 
(HLFE), in which HLFE was used for feature extraction in this paper. 
This study proposed the use of background subtraction and frame difference. 
This research was conducted to analyse the difference of background 
subtraction and frame difference methods based on movement of human. 
Movement of human detected by using feature extraction were centroid 
image technique used. Furthermore, support vector machine (SVM) 


was used for classification. 
High-level feature extraction 


Movement of human 


Support vector machine This is an open access article under the CC BY-SA license. 
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1. INTRODUCTION 

Nowadays, there are numbers of crime cases happened such as robbery, fighting and others. 
Therefore, CCTV is installed in high crime cases area and human is assigned to monitor the situation [1]. 
However, humans are prone to error, easily get tired and might missed out the crime event. This research will 
focus on detecting the human movement between walking and running by using background subtraction 
and frame difference. The problem in digital image processing is human required to monitor the behaviours 
of subjects in the usually complex scenes. Manual observation by human however is not appropriate, 
as it requires attentive and careful concentration over a long period of time [2]. Then the automated 
surveillance system is needed to detect human behaviour. Therefore, comparison of human detection 
is applied to acknowledge human movement in digital image processing. Research from S. A. Dhole have 
proposed techniques image acquisition, image pre-processing, image segmentation, feature extraction 
and classification but it’s not suitable for this research. 

The image acquisition in this research is a video of human walking and running. Pre-processing 
is the process of enhancing an image. It involves noise removal, segmentation, and other processes [3]. 
Feature extraction is the subsequent step after pre-processing that focuses on high-level feature extraction 
(HLFE). Two categories of feature extraction are HLFE and low-level feature extraction (LLFE). 
LLFE is based on motion directly while HLFE is based on shapes [4]. This research focused on the feature 
extraction used shape-based by applying background subtraction and frame _ difference. 
Kavitha and Tejaswini presented motion detection by overcoming the disadvantages of background 
subtraction algorithm [5]. The research has robust an efficiently computed background subtraction algorithm 
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which is able to cope with the problems of local illumination changes such as shadows and highlights as well 
as global illumination changes. The first step in background subtraction is to initialize image by having no 
motion or object in the frame. In this technique, it is enough to choose one frame image as a background 
subtraction image. The frame difference method is the subtraction of frames have motion as background 
subtraction. In order to ensure the feature used is suitable, a verification process through classification should 
be used. Support vector machine (SVM) technique was chosen in this study since it is good 
for validation feature. 


2. RELATED WORK 

Images that can be processed by a computer are called digital images, which can be obtained 
by image acquisition; it is a process that is performed to obtain a digital image [6]. In order to meet 
the demand of high speed, high efficiency, and high density, an image acquisition system was invented 
to realise automatic acquisition, which provides a variety of data sharing interface to realise direct docking 
with database [7]. Colour image can be represented by colour histograms that contain a one-column array 
that has 256 elements [8]. RGB to greyscale process is used to minimise the number of bits and pixels 
because RGB has 24 bits and the luminance of pixel of 0-255. This is a disadvantage for RGB; therefore, 
greyscale colour is used for better performance. The conversion of a colour image into a greyscale image also 
converts the RGB values of 24 bit mto the greyscale values of 8 bit [9]. The greyscale colour is good 
for reducing processing time, which is achieved by reducing the storage from 24 storage to 8 storage. 

Background subtraction is an image that contains no moving object in a video. Obtaining 
a background model can be done in two steps. The first model is background initialisation, where 
the background image is obtained from a specific time from the video sequence, and the second model 
is frame difference, where the background is updated due to the changes that may occur in the real 
scene [10]. Background subtraction and frame difference were used is this research. The basic way to 
separate a moving object from its background is to subtract the background from the image, leaving just 
the moving object or the foreground [11]. Background subtraction pixels from background subtraction can 
be extracted; thus, the moving object or human detection in this research can be defined by: 


HO)=) fey) (1) 


H(x) = >; £3) (2) 


The process of background subtraction or frame difference consists of two steps. The first step 
is performing a thresholding on H(x) and H(y). Therefore, it will compute the first derivative, which can be 


defined by “ln = H|n] — H|[n— 11], where the pixels change from 1 to 0 or vice versa in order to get 


the values [12]. Rakibe and Patil presented motion detection by developing a new algorithm based 
on the background subtraction algorithm [13]. Their first reliable background model was based 
on the statistical used, then subtraction between the current image and background image was done based 
on threshold. Detection of moving object is done after background mage. After that, morphological filtering 
was initiated to remove the noise and solve the background interruption difficulty. 

Since binary images are easy to operate, other storage format images are often converted into binary 
images when they are used for enhancement or edge detection [14]. However, each sample has one 
of the results that contains the desired object. In [13], they used filtermmg and binarising transformed greyscale 
image technique in order to remove holes, where the image was passed through the Gauss low-pass filter. 
[4,was obtained by filtering the gray image. Then, /,, image was binarised using binary threshold and [,, 
binary image was produced. 


Tao (&, y=lay (3) 


Where (x, y) represent pixel coordinates in the image. In image processing, noise commonly appears in an 
image. Noise exists in various types and characteristics [15]. Filtering is a basis function that is used to 
achieve many tasks such as noise reduction, interpolation, and resampling [16]. This research used Sobel 
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magnitude gradient for edge detection and gradient method to detect the edges by looking for the maximum 
and minimum in the first derivative of the image. 

Gaussian smoothing filter, is considered a “perfect” blur for many applications, provided that kernel 
support is large enough to fit the essential part of the Gaussian [17]. When Gaussian filter is used for noise 
suppression, a large filter variance is effective in smoothing out noise, but at the same time it distorts those 
parts of the image where there are abrupt changes in pixel brightness [18]. This research used 
two-dimensional filter where the Gaussian distribution in one-dimensional form is: 


x2 


e@ 202 (4) 
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Where o is the standard deviation of the distribution. Based on the one-dimensional Gaussian filter, 


two-dimensional digital Gaussian filter can be expressed as: 
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Where o“is the variance of Gaussian filter, and the size of the filter kernel 1 (-1 5 x, y 5 1) is often 
determined by omitting values lower than five percent of the maximum value of the kernel. 

This research used two-dimensional filter for reducing the noise in the images. Intelligent 
monitoring systems do not only detect motion but also separate moving objects in the foreground 
of the stationary objects in the background [19]. The goal of motion detection is to recognise the motion 
of objects found in two given images [13]. In this research, the detection of motion image depends on 
the process prior to background detection, which is Gaussian smoothing filter. This paper used binary large 
object (BLOB) to detect human or non-human by marking it BLOB, which is the technique used in image 
processing [20]. An accurate computation of BLOBs centre point is dependent on the accuracy of BLOB 
detection, particularly for BLOBs that are not perfectly circular or squarely -shaped [21 ]. 

HLFE was used in this research to identify the object, and then a description of the true object can 
be identified [20]. The moving objects are then tracked and finally a human modelling “Star Skeletonisation” 
is applied to detect human objects and the motion analysis used [10]. The centre of an image in the area 
where a human is moving is called centroid. The centroid is located at position (Cx, Cy), where Cx and Cy 
are respectively the average of the x andy coordinates forall white pixels with the boundary of the shape that 
consists of a series of boundary points [22]. Layer G and others have developed the framework for detecting 
the suspicious event from a video through three-steps LLFE, event classification, and event analysis with one 
assumption, which is unlabelled video sequences are known to contain only oneevent [23]. 

SVM had use in this research and it is trained using optimised centroidal profile of both categories 
to mvestigate their relative influence on human shape recognition [24]. The classification of pixel for 
background subtraction and frame difference is determined in a receiver operating characteristic (ROC) 
curve. SVM error can be used to differentiate data between background subtraction and frame difference. 
An ROC curve is a graphical plot that reveals the performance of a binary classifier. This curve is drawn by 
plotting the true positive rate against the false positive rate at various threshold settings [25]. 


3. METHODOLOGY 

High level feature extraction starts with image acquisition for the mput image of the study. 
Pre-processing is a critical process to determine a good image where there is no noise in the image. 
The feature extraction for this research used centroidal technique image, which is a step to obtain 
the characteristics of the image. It was used to identify the feature, which was the objective of the research. 
Classification of image is the last step m digital image and this research used SVM. Figure | shows 
the overall block diagrams of this research. All expertments were conducted using Matlab R2017a 
and graphics processing units (GPU) used for simulations were an Intel Core 17-6700K CPU with 4.00 GHz 
working frequency and 8 GB RAM. 


3.1. Image acquisition 

The image acquisition for this research was conducted using a video with the size 640x320 pixels. 
The video was converted to image sequences. The input image of this research was a video converted 
to images of human walking and running. This made it easier for the analysis of background subtraction 
and frame difference based on the frames of the images conducted. 
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Figure 1. Summary fortracking individual activities to track suspicious activities 


3.2. Pre-processing 

A greyscale image was used to subtract an image matrix. It was easier using a greyscale image 
because the image consisted of a data matrix whose values represented intensities with some ranges from 0 
and 1 or O to 255 for uint8. This research proposed the use of background subtraction. Two categories 
of subtractions, which were background subtraction by the first frame of the image and frame difference 
of the image were used. The video was recorded starting with no object so that several frames would be 
selected as background subtraction frames. The (8) shows the background subtraction equation: 


BS(x,y,t) =I(,y,t) — B(x, y,t) (6) 


Where f is the estimated time, BS is the background subtraction, and / is the image at time f¢, then B 
is the background image input at time f that is set up to the number of frame=1. Background subtraction 
of frame sequence is the subtraction of frame by frame, which is estimated by the previous frame. 
Morphological technique was involved to perform the segmentation of images after the background 
subtraction process, in which binary images were converted. 


3.3. Feature extraction 

Feature extraction is the enhancement of an image from the pre-processing of an image. 
Feature extraction is also known as shape-based or foreground detection. The image input will be different to 
the output after feature extraction. It will be a large data from the image that will be described by feature 
extraction and it will be reduced to achieve sufficient accuracy of data by the combination of various 
techniques. In this research, HLFE was proposed to detect human movement, either walking or running. 
This is to determine the normal and abnormal movement of human. This paper used a centroid image 
as the feature extraction where the centroid image. The centroid image can determine if the movement is by 
a human ornot. It was proven by the centroidal profile of 100 human and non-human images at 10° interval 
as show in Figure 2, which were extracted and yielded 36 centroidal feature profiles [24]. Centroid image 
was used to determine the movement of human based on the centroidalangle [24] show in Figure 3. 

The features are the 80°, 90°, 100° 110° profiles which are located in region 1, followed by four 
more features of region 2 located at the 180°, 190°, 200° and 210° of angle intervals. Region 3 has7 features 
and they are located at the 250°, 260°, 270°, 280°, 290°, 300°, 310° of angle intervals and finally four 
features are positioned at the angle intervals of 350°, 360°, 10° and 20° in Region 4 as shown in Figure 3. 


Bulletin of Electr Eng & Inf, Vol. 9, No. 1, February 2020: 345-353 


Bulletin of Electr Eng & Inf ISSN: 2302-9285 O 349 


. = * 
Repgnom 2 





Figure 2. Example of the generalized Figure 3. The generalized centroidal gait profile 
centroidal profile forshape modelling [24] representing the fourregions [24] 


3.4. Classification 

SVM was used to classify the results of abnormal or normal movement of human based 
on the results of HLFE. The pixels of each background subtraction and frame difference were tested 
and analysed. The prediction of HLFE imagein background subtraction and frame difference sequences was 
computed for evaluation purpose. Classification of normal and abnormal movement was determined using 
SVM error. 10 classifications were tested for each background subtraction and frame difference 
for the prediction. 


4. RESULTS AND DISCUSSION 
The results for this research were divided into three sections: pre-processing, feature extraction, 
and classification. 


4.1. Pre-processing 

The pre-processing for this research covered the converted images of a video that were extracted 
by using multiple techniques as stated previously in the methodology. Images were extracted to obtain 
the movement of a human in the frame of images based on background subtraction and frame difference 
sequences. The results for the pre-processing techniques are shown in Table 1. The result of Table 1 shown 
background subtraction is better than frame different for pre-processing. The human movement can 
be detected efficiently in background subtraction compared frame difference. 


4.2. Feature extraction 

Feature extraction is the extraction of an image from pre-processing. This research used centroid 
image technique for feature extraction and confirmation of human body [26]. The centroid images can 
be illuminated as Figure 4. The centroid is the centre of images based on cropped images. The centroid image 
was divided into angles in 360° with 10° of each degree. The angle of degree in the centroid image was 
calculated and then classified to the image to determine either it is abnormal or normal from 
the movement of angle. An abnormal image is triggered when the number of change in angle has high degree 
of movement from frame to frame and if it is normal, there will be low degree of movement from 
frame to frame. 


4.3. Classification 

The process of classification of an image is important to ensure the data are correct. The movement 
of a human in the frames were calculated by the pixels. The pixels for background subtraction and frame 
difference were calculated and presented in Figure 5. Based on Figure 5, the background subtraction 
technique is stable than frame difference because the subtraction of frame difference is low compared 
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to background subtraction. There was zero reading in frame O to 40 because no movement was detected in 
the frame. Both techniques are good to detect movement in image processing but for this research, 
background subtraction technique is better than frame difference because movement of a human 
in background subtraction is more stable than the movement of a human in frame difference. 


Table 1. Results for pre-processing techniques 


Categories Background Subtraction Frame Difference 








Greyscale image 


(a) 


— 


(ii) 


Background subtraction/frame 
difference 


(b) 





Binary image 


(c) 






Sobel gradient magnitude 
(d) 


Smoothed gradient magnitude 


(e) 


Motion detection 


(f) 


(i) 
(i) 
BLOB detection 
(g) 
k 
(i) 





Cropped image 
(h) 





(i) 


Based on Figure 6, the number error of background subtraction is lower than frame difference, 
where the average of error of background subtraction is 0.05 and frame difference is 0.24. The higher number 
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of error for background subtraction is 0.20 or 20% while frame difference is 0.45 or 45% of the testing. 
This shows that the data from background subtraction are good and the data from frame difference 
are not stable. The area underthe curve for SVM is produced from the results of ROC curve. 





Figure 4. Centroid images for, (a) Background subtraction, (b) Frame difference sequences 


NO OF PIXEL 








Figure 5. Graph of background subtraction vs. frame difference sequences 


Based on Figure 7, background subtraction has good and accurate data because the data for area 
under the curve are approaching | or 100%, while the lowest is 0.80 and it is acceptable. The data 
for frame difference are not good based on the area under the curve. This is due to low stability of frame 


difference as shown in Figure 7. 


Number of Error 


00 


Number of Testing 


e=emeError SVM Background Subtraction e=emeError SVM Fremes Different 





Figure 6. SVM error for background subtraction vs. frame difference sequences 
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Figure 7. Area underthe curve for SVM of background subtraction vs. area under the curve 
for SVM frame difference 


Figure 8 shows the results of ROC curve by SVM based on background subtraction in the prediction 
of number six, where the number of error is O for test number six. The area under the curve for ROC curve 
based on Figure 7 is 1 and it has similar number of data as in Figure 8. Figure 9 shows the results of ROC 
curve by SVM based on frame difference in the prediction of number three where the number of error is 0.33 
for test numberthree. The area underthe curve for ROC curve based on Figure 9 1s 0.67. 
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Figure 8. ROC curve by SVM based on background Figure 9. ROC curve by SVM based on frame 
subtraction difference 


5. CONCLUSIONS 

Based on the results of HLFE of background subtraction and frame difference, the technique that 
is better for HLFE is background subtraction. Normal and abnormal human movement have been classified 
from the results of feature extraction where SVM error of background subtraction is better than frame 
difference. This was based on the accuracy of background subtraction was 95%, which makes it more 
accurate than frame difference sequences that have 55% accuracy. Therefore, background subtraction should 
be implement in the automated human detection and the result can conclude the background subtraction 
better than frame different for human detection. The future work for the research is more analysis 
on classification technique are used for the research and combines HLFE as detect human normal 
and abnormal with other technique classification methods and make it a complete system such 
asa surveillance application system. 
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